National Repository of Grey Literature 2 records found  Search took 0.00 seconds. 
Automatic Creation of Corpora
Šantavý, Marek ; Černocký, Jan (referee) ; Smrž, Pavel (advisor)
This work is a presentation of tagging and formatting of text-data corpus. It creates a layer above suitable represented documents for their mutual comparison in order to determine the similarity among them. Tools that provide near-duplicate calculations are the basis for an automated system for creation and expansion of the existing text-data corpus. There is an option to choose between two basic approaches according to the significance of the outcome. Means of new text-data acquiring is the tool for web crawling.
Automatic Creation of Corpora
Šantavý, Marek ; Černocký, Jan (referee) ; Smrž, Pavel (advisor)
This work is a presentation of tagging and formatting of text-data corpus. It creates a layer above suitable represented documents for their mutual comparison in order to determine the similarity among them. Tools that provide near-duplicate calculations are the basis for an automated system for creation and expansion of the existing text-data corpus. There is an option to choose between two basic approaches according to the significance of the outcome. Means of new text-data acquiring is the tool for web crawling.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.